How to evaluate the legality of web scraping
for your research
Image: by pikisuperstar on Freepik
Researchers require web scraping to generate data about the digital economy.
“Most of the 313 identified web data-based articles rely on web scraping (59%); APIs are used much more sparingly (12%), and some articles combine web scraping and APIs (9%). The remaining articles-especially netnographic work-use web data but tend to extract it manually (20%)[4].”
This presentation touches upon the various legal challenges faced by academic researchers (and businesses alike).
Background: LinkedIn attempts to block hiQ from web scraping publicly available data on its platform.
Key Takeaways:
Implications:
Image: by vector4stock on Freepik
Image: Investopedia / Joules Garcia
Image: ADAPTED BY N. CARY/SCIENCE
Database rights are a subset of copyright. A database is an organized collection of materials that allows users to search and access individual pieces of information.
Copyright law protects databases when the way the data is selected or arranged is original and creative. Therefore, scraping cannot result in copying and, for example, republishing the original database’s structure (or a substantial part of it)[12].
Non-original databases can also be protected if a significant investment was made in obtaining, verifying, and presenting the data[13].
When scraping a data source that may be subject to database rights, consider:
A Terms of Service (ToS), Terms of Use, or Privacy Policy can be found on almost every website.
Therefore, the following question arises: Do we have to abide by the website’s Terms of Services?
Image: Delesign Graphics
The response is based on the kind of legal terms of service that are in place:
| Clickwrap: ToS that you must explicitly agree to. |
Browsewrap: ToS that are buried on the site. |
|---|---|
| - If you have to explicitly agree to the ToS in any way (such as by logging in, clicking ‘I agree’ or ‘OK’, or downloading the app), these are click wrap ToS. | - These ToS are usually accessible via a link at the bottom of a webpage. |
| - You are informed of the existence of the ToS, and you are actively agreeing to them. | - They state that you agree to the terms simply by using or browsing the site. |
| - Courts have ruled that your explicit agreement creates a binding contract that you must follow. | - Most courts have ruled that this type of ToS is unenforceable, so even if the terms forbid you from using the service, you may not be in violation of them[9]. |
Image: by vectorjuice on Freepik
In all other cases, the company or organisation must obtain the data subject’s permission (known as “consent”) before collecting or reusing their personal information[16].
Next to establishing the lawful reason for scraping data, you should also consider the type of personal data being collected. Sensitive data is a subject to additional rules and requires explicit consent to be given for this data to be scraped and stored.
Sensitive data includes:
Therefore, you should avoid scraping this data unless you have explicit consent and legitimate reason to do so[17].
If you process data, you have to do so according to the data protection principles:
Even when you received an explicit consent from the data subject, you need to ensure that the correct data retention and access policies are in place:
Residential proxies provide real IP addresses of actual devices. When using a residential IP for scraping (or even just accessing web pages), you appear to be accessing websites and social media platforms from an actual home-based IP[18].
Image: Pixabay
Consider territorial scope when evaluating web scraping legality for compliance with relevant jurisdiction laws.